Neighbor-Aware Search for Approximate Labeled Graph Matching using the Chi-Square Statistics
نویسندگان
چکیده
Labeled graphs provide a natural way of representing entities, relationships and structures within real datasets such as knowledge graphs and protein interactions. Applications such as question answering, semantic search, and motif discovery entail efficient approaches for subgraph matching involving both label and structural similarities. Given the NP-completeness of subgraph isomorphism and the presence of noise, approximate graph matching techniques are required to handle queries in a robust and real-time manner. This paper presents a novel technique to characterize the subgraph similarity based on statistical significance captured by chi-square statistic. The statistical significance model takes into account the background structure and label distribution in the neighborhood of vertices to obtain the best matching subgraph and, therefore, robustly handles partial label and structural mismatches. Based on the model, we propose two algorithms, VELSET and NAGA, that, given a query graph, return the top-k most similar subgraphs from a (large) database graph. While VELSET is more accurate and robust to noise, NAGA is faster and more applicable for scenarios with low label noise. Experiments on large real-life graph datasets depict significant improvements in terms of accuracy and running time in comparison to the state-of-the-art methods.
منابع مشابه
An Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملStructure and attribute index for approximate graph matching in large graphs
The increasing popularity of graph data in various domains has lead to a renewed interest in developing efficient graph matching techniques, especially for processing large graphs. In this paper, we study the problem of approximate graph matching in a large attributed graph. Given a large attributed graph and a query graph, we compute a subgraph of the large graph that best matches the query gr...
متن کاملEFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The ma...
متن کاملThe matching interdiction problem in dendrimers
The purpose of the matching interdiction problem in a weighted graph is to find two vertices such that the weight of the maximum matching in the graph without these vertices is minimized. An approximate solution for this problem has been presented. In this paper, we consider dendrimers as graphs such that the weights of edges are the bond lengths. We obtain the maximum matching in some types of...
متن کاملON THE MATCHING NUMBER OF AN UNCERTAIN GRAPH
Uncertain graphs are employed to describe graph models with indeterministicinformation that produced by human beings. This paper aims to study themaximum matching problem in uncertain graphs.The number of edges of a maximum matching in a graph is called matching numberof the graph. Due to the existence of uncertain edges, the matching number of an uncertain graph is essentially an uncertain var...
متن کامل